Convolutional Neural Network (CNN)-based image super-resolution (SR) has exhibited impressive success on known degraded low-resolution (LR) images. However, this type of approach is hard to hold its performance in practical scenarios when the degradation process is unknown. Despite existing blind SR methods proposed to solve this problem using blur kernel estimation, the perceptual quality and reconstruction accuracy are still unsatisfactory. In this paper, we analyze the degradation of a high-resolution (HR) image from image intrinsic components according to a degradation-based formulation model. We propose a components decomposition and co-optimization network (CDCN) for blind SR. Firstly, CDCN decomposes the input LR image into structure and detail components in feature space. Then, the mutual collaboration block (MCB) is presented to exploit the relationship between both two components. In this way, the detail component can provide informative features to enrich the structural context and the structure component can carry structural context for better detail revealing via a mutual complementary manner. After that, we present a degradation-driven learning strategy to jointly supervise the HR image detail and structure restoration process. Finally, a multi-scale fusion module followed by an upsampling layer is designed to fuse the structure and detail features and perform SR reconstruction. Empowered by such degradation-based components decomposition, collaboration, and mutual optimization, we can bridge the correlation between component learning and degradation modelling for blind SR, thereby producing SR results with more accurate textures. Extensive experiments on both synthetic SR datasets and real-world images show that the proposed method achieves the state-of-the-art performance compared to existing methods.
translated by 谷歌翻译
Existing convolutional neural networks (CNN) based image super-resolution (SR) methods have achieved impressive performance on bicubic kernel, which is not valid to handle unknown degradations in real-world applications. Recent blind SR methods suggest to reconstruct SR images relying on blur kernel estimation. However, their results still remain visible artifacts and detail distortion due to the estimation errors. To alleviate these problems, in this paper, we propose an effective and kernel-free network, namely DSSR, which enables recurrent detail-structure alternative optimization without blur kernel prior incorporation for blind SR. Specifically, in our DSSR, a detail-structure modulation module (DSMM) is built to exploit the interaction and collaboration of image details and structures. The DSMM consists of two components: a detail restoration unit (DRU) and a structure modulation unit (SMU). The former aims at regressing the intermediate HR detail reconstruction from LR structural contexts, and the latter performs structural contexts modulation conditioned on the learned detail maps at both HR and LR spaces. Besides, we use the output of DSMM as the hidden state and design our DSSR architecture from a recurrent convolutional neural network (RCNN) view. In this way, the network can alternatively optimize the image details and structural contexts, achieving co-optimization across time. Moreover, equipped with the recurrent connection, our DSSR allows low- and high-level feature representations complementary by observing previous HR details and contexts at every unrolling time. Extensive experiments on synthetic datasets and real-world images demonstrate that our method achieves the state-of-the-art against existing methods. The source code can be found at https://github.com/Arcananana/DSSR.
translated by 谷歌翻译
User-generated-content (UGC) videos have dominated the Internet during recent years. While many methods attempt to objectively assess the quality of these UGC videos, the mechanisms of human quality perception in the UGC-VQA problem is still yet to be explored. To better explain the quality perception mechanisms and learn more robust representations, we aim to disentangle the effects of aesthetic quality issues and technical quality issues risen by the complicated video generation processes in the UGC-VQA problem. To overcome the absence of respective supervisions during disentanglement, we propose the Limited View Biased Supervisions (LVBS) scheme where two separate evaluators are trained with decomposed views specifically designed for each issue. Composed of an Aesthetic Quality Evaluator (AQE) and a Technical Quality Evaluator (TQE) under the LVBS scheme, the proposed Disentangled Objective Video Quality Evaluator (DOVER) reach excellent performance (0.91 SRCC for KoNViD-1k, 0.89 SRCC for LSVQ, 0.88 SRCC for YouTube-UGC) in the UGC-VQA problem. More importantly, our blind subjective studies prove that the separate evaluators in DOVER can effectively match human perception on respective disentangled quality issues. Codes and demos are released in https://github.com/teowu/dover.
translated by 谷歌翻译
Point cloud registration is a popular topic which has been widely used in 3D model reconstruction, location, and retrieval. In this paper, we propose a new registration method, KSS-ICP, to address the rigid registration task in Kendall shape space (KSS) with Iterative Closest Point (ICP). The KSS is a quotient space that removes influences of translations, scales, and rotations for shape feature-based analysis. Such influences can be concluded as the similarity transformations that do not change the shape feature. The point cloud representation in KSS is invariant to similarity transformations. We utilize such property to design the KSS-ICP for point cloud registration. To tackle the difficulty to achieve the KSS representation in general, the proposed KSS-ICP formulates a practical solution that does not require complex feature analysis, data training, and optimization. With a simple implementation, KSS-ICP achieves more accurate registration from point clouds. It is robust to similarity transformation, non-uniform density, noise, and defective parts. Experiments show that KSS-ICP has better performance than the state of the art.
translated by 谷歌翻译
3D点云的客观质量评估对于在现实世界应用中的沉浸式多媒体系统的开发至关重要。尽管对2D图像和视频的感知质量评估成功,但对于具有大规模不规则分布的3D点的3D点云仍然很少。因此,在本文中,我们提出了一个带有结构引导重采样(SGR)的客观点云质量指数,以自动评估3D密集点云的感知视觉质量。所提出的SGR是无需任何参考信息的通用盲质量评估方法。具体而言,考虑到人类视觉系统(HVS)对结构信息高度敏感,我们首先利用点云的唯一正常向量来执行区域预处理,其中包括按键重新采样和局部区域构建。然后,我们提取三组与质量相关的特征,包括:1)几何密度特征; 2)颜色自然特征; 3)角度一致性特征。人脑的认知特征和自然性的规律性都涉及设计的质量感知功能,这些特征可以捕获扭曲的3D点云的最重要方面。对几个公开可用的主点云质量数据库进行的广泛实验验证了我们提出的SGR可以与最新的全参考,减少引用和无参考质量评估算法竞争。
translated by 谷歌翻译
由于深度神经网络的开发,尤其是对于最近开发的无监督的JND代模型,对公正的显着差异(JND)建模做出了重大改进。但是,他们有一个主要的缺点,即在现实世界信号域而不是在人脑中的感知结构域中评估了生成的JND。当在这两个域中评估JND时,存在明显的差异,因为在现实世界中的视觉信号在通过人类视觉系统(HVS)传递到大脑之前已编码。因此,我们提出了一个受HVS启发的信号降解网络进行JND估计。为了实现这一目标,我们仔细分析了JND主观观察中的HVS感知过程,以获得相关的见解,然后设计受HVS启发的信号降解(HVS-SD)网络,以表示HVS中的信号降解。一方面,知识渊博的HVS-SD使我们能够评估感知域中的JND。另一方面,它提供了更准确的先验信息,以更好地指导JND生成。此外,考虑到合理的JND不应导致视觉注意力转移的要求,提出了视觉注意力丧失以控制JND的生成。实验结果表明,所提出的方法实现了SOTA性能,以准确估计HVS的冗余性。源代码将在https://github.com/jianjin008/hvs-sd-jnd上找到。
translated by 谷歌翻译
随着非专家们拍摄的野外视频的快速增长,盲目视频质量评估(VQA)已成为一个具有挑战性且苛刻的问题。尽管已经做出了许多努力来解决这个问题,但尚不清楚人类视觉系统(HVS)与视频的时间质量有何关系。同时,最近的工作发现,自然视频的框架变成了HV的感知领域,往往会形成表示形式的直线轨迹。通过获得的洞察力,即失真会损害感知的视频质量并导致感知表示的弯曲轨迹,我们提出了一个时间感知质量指数(TPQI),以通过描述表示形式的图形形态来测量时间失真。具体而言,我们首先从HVS的横向基因核(LGN)和主要视觉区域(V1)中提取视频感知表示,然后测量其轨迹的直率和紧凑性,以量化视频的自然性和内容连续性的降解。实验表明,HVS中的感知表示是一种预测主观时间质量的有效方法,因此TPQI首次可以实现与空间质量度量的可比性能,并且在评估具有较大时间变化的视频方面更加有效。我们进一步证明,通过与NIQE(空间质量指标)结合使用,TPQI可以在流行的野外视频数据集中实现最佳性能。更重要的是,除了要评估的视频之外,TPQI不需要任何其他信息,因此可以将其应用于任何数据集,而无需参数调整。源代码可在https://github.com/uolmm/tpqi-vqa上找到。
translated by 谷歌翻译
当前的深度视频质量评估(VQA)方法通常在评估高分辨率视频时具有高计算成本。这使他们无法通过端到端培训学习更好的视频质量相关表示。现有方法通常考虑幼稚的采样以降低计算成本,例如调整大小和裁剪。但是,它们显然在视频中损坏了与质量相关的信息,因此并不是学习VQA的良好表示形式的最佳选择。因此,渴望为VQA设计一种新的质量保留抽样方案。在本文中,我们提出了网格迷你斑点采样(GMS),该采样允许通过在原始分辨率下采样贴片来考虑局部质量,并通过以统一网格采样的迷你绘制来涵盖全球质量。这些迷你斑点是剪接和对齐的,称为片段。我们进一步构建了专门设计的碎片注意网络(粉丝),以适应碎片作为输入。由片段和粉丝组成,VQA(快速VQA)提出的片段样品变压器可实现有效的端到端深VQA,并学习有效的与视频质量相关的表示。它可以提高最新准确性约10%,同时减少1080p高分辨率视频的99.5%的失败。新学习的与视频质量相关的表示形式也可以转移到较小的VQA数据集中,从而在这些情况下提高性能。广泛的实验表明,Fast-VQA在各种分辨率的输入方面具有良好的性能,同时保持高效率。我们在https://github.com/timothyhtimothy/fast-vqa上发布代码。
translated by 谷歌翻译
在现有作品中,框架及其对视频质量评估(VQA)的影响之间的时间关系仍然不足。这些关系导致视频质量的两种重要效果类型。首先,某些时间变化(例如摇动,闪烁和突然的场景过渡)会导致时间扭曲并导致额外的质量降解,而其他变化(例如,与有意义的事件相关的变化)却没有。其次,人类视觉系统通常对具有不同内容的框架有不同的关注,从而导致其对整体视频质量的重要性不同。基于变压器的突出时间序列建模能力,我们提出了一种新颖有效的基于变压器的VQA方法来解决这两个问题。为了更好地区分时间变化,从而捕获了时间变形,我们设计了一个基于变压器的时空扭曲提取(STDE)模块。为了解决时间质量的关注,我们提出了类似编码器的时间含量变压器(TCT)。我们还介绍了功能上的时间抽样,以减少TCT的输入长度,以提高该模块的学习效率和效率。由STDE和TCT组成,用于视频质量评估(DISCOVQA)的拟议的时间失真符合变压器(DISCOVQA)在几个VQA基准上达到了最新的性能,而无需任何额外的预训练数据集,多达10%的概括能力提高了10%比现有方法。我们还进行了广泛的消融实验,以证明我们提出的模型中每个部分的有效性,并提供可视化以证明所提出的模块实现了我们对这些时间问题进行建模的意图。我们将在以后发布我们的代码和预算权重。
translated by 谷歌翻译
发现深度学习模型很容易受到对抗性示例的影响,因为在深度学习模型的输入中,对扰动的扰动可能引起错误的预测。对抗图像生成的大多数现有作品都试图为大多数模型实现攻击,而其中很少有人努力确保对抗性示例的感知质量。高质量的对手示例对许多应用很重要,尤其是保留隐私。在这项工作中,我们基于最小明显差异(MND)概念开发了一个框架,以生成对对抗性隐私的保留图像,这些图像与干净的图像具有最小的感知差异,但能够攻击深度学习模型。为了实现这一目标,首先提出了对抗性损失,以使深度学习模型成功地被对抗性图像攻击。然后,通过考虑摄动和扰动引起的结构和梯度变化的大小来开发感知质量的损失,该损失旨在为对抗性图像生成保持高知觉质量。据我们所知,这是基于MND概念以保存隐私的概念来探索质量保护的对抗图像生成的第一项工作。为了评估其在感知质量方面的性能,在这项工作中,通过建议的方法和几种锚方法测试了有关图像分类和面部识别的深层模型。广泛的实验结果表明,所提出的MND框架能够生成具有明显改善的性能指标(例如PSNR,SSIM和MOS)的对抗图像,而不是用锚定方法生成的对抗性图像。
translated by 谷歌翻译